UNISYS; 

MUC-3  TEST  RESULTS  AND  ANALYSIS 


Carl  Weir,  Robin  McEntire,  Barry  Silk,  and  Tim  Finin 
Unisys  Center  for  Advanced  Information  Technology 
Paoli,  Pennsylvania 
weir  @prc. Unisys .  com 
(215)  648-2369 


INTRODUCTION 

The  Unisys  MUC-3  system  is  based  on  a  three-tiered  approach  to  text  processing  in  which  a  novel  and 
quite  powerful  knowledge-based  form  of  information  retrieval  plays  a  central  role.  The  main  components 
of  this  approach  are  as  follows; 

A  Keyword-Based  Information  Retrieval  Component. 

This  component  predicts  the  occurrence  of  types  of  events  in  texts  based  bn  the  presence 
of  key  words  and  phrases. 

A  Knowledge-Based  Information  Retrieval  Component. 

This  component,  called  KBIRD  in  the  Unisys  MUC-3  system,  performs  the  foUowing 
tasks: 

•  Based  on  the  co-occurrence  of  the  predictions  made  by  the  keyword-based  analysis 
component  and  expressions  and  concepts  discovered  in  a  given  text,  it  predicts  the 
likely  occurrence  of  additional  event  types. 

•  It  locates  instances  of  predicted  event  types  in  texts. 

•  It  identifies  possible  slot  values  for  located  instances  of  events. 

[A  Linguistic  Analysis  Component.] 

Although  a  natural  language  processing  component  was  included  in  the  design  of  the 
Unisys  MUC-3  system  as  a  third  level  of  text  analysis,  not  enough  time  was  available 
during  the  MUC-3  development  cycle  both  to  develop  a  knowledge-based  information 
retrieval  component  and  to  port  the  Unisys  Pundit  text-processing  system  to  the  MUC- 
3  terrorist  domain.  A  decision  was  made  to  focus  on  developing  the  knowledge-based 
information  retrieval  component  and  postpone  the  integration  of  Pundit  until  MUC-4. 

A  Template  Generation  Component. 

An  application-specific  Prolog  program  was  written  to  merge  templates  describing  the 
same  event,  and  to  select  the  most  likely  slot  values  for  templates  in  cases  where  multiple 
slot  values  were  proposed. 


The  Unisys  MUC-3  development  effort  was  comprised  of  two  full-time  Unisys  staff  members  and  one 
government  employee  on  industrial  rotation.  A  total  of  2650  person-hours  were  put  into  the  project,  800 
of  which  were  contributed  by  the  government  employee.  The  effort  was  partially  supported  by  a  DARPA 
grant,  which  covered  approximately  30%  of  the  development  cost.*^  The  bulk  of  the  effort  involved  the 
development  of  the  KBIRD  system  and  its  MUC-3  rule  base.  These  two  tasks  took  approximately  the 
same  amount  of  time,  and  in  total  comprised  roughly  85%  of  the  effort. 


^  Work  on  this  project  was  partially  supported  by  Darpa  under  contract  MDA-903-89-C-0041. 
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Figure  1:  Unisys  MUC-3  System  Scores 

TEST  RESULTS 

The  scores  reported  for  the  Unisys  MUC-3  system  ate  shown  in  Figure  1.  The  low  ACT  and  high  MIS 
scores  reported  for  the  template  id  slot  indicate  that  event  detection  was  a  problem.^  Poor  event  detection 
performance  explains  the  relatively  low  recall  scores  repotted  for  aU  but  the  MATCHED  ONLY  summary 
measurement.  The  MATCHED  ONLY  recall  score  is  a  measure  of  performance  in  which  spurious  (false 
positive)  and  missing  (false  negative)  templates  ate  not  factored  in.  The  extremely  low  SPU  score  reported 
for  the  template  id  slot  suggests  that  further  training  of  the  rule  base  to  improve  event  detection  will  not 
come  at  the  expense  of  lower  precision  scores.  In  Figure  2,  the  performance  of  the  Unisys  system  with 
respect  to  other  MUC-3  systems  is  indicated  in  two  scatter  plots. 


Since  template  slot-filling  algorithms  are  triggered  by  the  detection  of  an  event,  poor  event  detection 
performance  has  a  direct  negative  impact  on  slot-filling  performance.  The  recall  scores  for  the  Unisys 
MUC-3  system  reflect  this  fact.  However,  for  five  slots  precision  scores  are  also  low.  These  low  precision 
scores  are  not  a  consequence  of  poor  event  detection,  but  result  instead  from  a  combination  of  poorly 
trained  inference  rules  used  to  extract  the  sort  of  information  expressed  in  the  pertinent  slots,  and  bugs 
in  the  template  generation  routines  that  gather  and  merge  correctly  detected  information  into  template 
structures. 


ANALYSIS 

Contrary  to  what  the  low  recall  scores  that  have  been  reported  suggest,  the  Unisys  MUC-3  system  can 
perform  well  at  predicting  events.  The  keyword-based  prediction  of  event  types  is  very  robust;  the  database 
used  during  this  stage  of  processing  was  derived  from  the  fuU  1300  message  DEV  corpus.  Moreover,  when 
the  rules  used  by  KBIRD  are  properly  trained,  they  do  a  very  good  job  of  locating  instances  of  the 
events  predicted  by  keyword  analysis.  Unfortunately,  the  KBIRD  locator  rules  used  to  detect  instances 
of  events  were  trained  on  a  relatively  small  set  of  messages — the  200  NOSC  DEV  and  TSTl  messages. 
Consequently,  even  though  the  keyword-based  analysis  phase  may  have  correctly  predicted  the  likely 
occurrence  of  a  given  event  type,  KBIRD  may  not  have  been  able  to  locate  an  instance  of  the  predicted 
event  type.  Thus,  KBIRD’s  locator  rules  had  a  negating  influence  on  the  performance  of  the  keyword- 
based  analysis  phase.  Prior  to  the  fin^ll  MUC-3  test,  versions  of  the  Unisys  system  with  fewer,  more 

^The  template  id  slot  is  scored  differently  from  other  slots — the  values  reported  for  this  slot  are  a  measure  of  event 
detection  performance  (it  doesn’t  make  sense  to  report  system  performance  in  generating  template  ids,  since  the  order  in 
which  templates  are  generated  is  not  relevant  in  this  task)  [2]. 
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Figure  2:  The  scatter  plot  on  the  left  indicates  the  relative  performance  of  the  Unisys  MUC-3  system  without 
taking  into  consideration  false  negative  and  false  positive  hits  (the  MATCHED-ONLY  score).  The  scatter  plot 
on  the  right  indicates  the  relative  performance  of  the  Unisys  MUC-3  system  when  taking  into  consideration 
both  false  negative  and  false  positive  hits  (the  ALL-TEMPLATES  score). 

general  event  detection  rules  in  place  had  recall  scores  ranging  in  the  high  30’s  and  low  40’s  for  all  the 
summary  measures.  A  tactical  mistake  was  made  in  attempting  to  replace  this  general  rule  base  with  a 
larger,  more  context-sensitive  one,  since  there  was  not  enough  time  to  allow  the  larger  rule  base  to  be 
properly  trained.  In  the  evaluation,  generating  spurious  templates  tended  to  have  much  less  of  an  impact 
on  scores  than  failing  to  generate  templates  at  all.  In  future  evaluations,  we  will  investigate  the  use  of 
different  locator  rule  sets  as  a  settable  system  parameter. 

Rule  training  was  hindered  during  the  MUC-3  development  cycle  by  the  need  to  concurrently  build  the 
component  that  would  be  using  the  rules.  In  addition  to  this  development  problem,  technical  difficulties 
in  KBIRD’s  design  began  to  appear  once  the  number  of  rules  had  grown  to  a  realistic  size.  These  technical 
problems  resulted  in  slow  message  processing  speeds,  which  further  complicated  the  rule  training  process. 
The  following  three  key  problems  were  identified: 

Heavy  use  of  forward-chaining. 

There  is  currently  too  much  reliance  on  forward-chaining  in  the  KBIRD  system.  Many 
KBIRD  reasoning  tasks  could  be  more  efficiently  achieved  in  a  backward-chaining  fashion. 

Expensive  TMS  system. 

KBIRD  was  built  on  top  of  a  very  general  inferencing  mechanism  with  an  expensive 
TMS  system.  KBIRD’s  needs  for  truth  maintenance  could  be  accomodated  using  a  much 
simpler  TMS  component. 

Inability  to  focus  search. 

In  KBIRD,  it  is  currently  not  possible  to  focus  search  on  a  specific  region  of  text.  The 
mechanism  used  to  satisfy  a  rule  looks  for  all  chart  elements  (concepts,  words,  phrases, 
and  so  forth)  that  match  constituent  expressions  in  the  antecedent  of  a  rule.  If  the  KBIRD 
rule  specifies  that  an  element  of  a  certain  type  must  be  in  the  same  sentence  as  some  other 
element,  it  would  be  more  efficient  to  limit  the  search  space  to  just  those  chart  elements 
that  fall  within  the  span  of  the  sentence.  However,  KBIRD’s  algorithm  currently  searches 
through  chart  elements  indexed  to  locations  anywhere  in  the  text  for  suitable  candidates. 
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CONCLUDING  REMARKS 


The  time  constraints  imposed  in  MUC-3  made  it  impossible  to  fully  develop  the  Unisys  MUC-3  system’s 
knowledge-based  information  retrieval  component,  KBIRD,  before  the  evaluation  deadline.  Consequently, 
it  is  not  possible  at  this  time  to  establish  the  capabilities  of  the  three-tiered  approach  realized  in  the 
system.  The  system’s  scores  indicate,  however,  that  although  the  rules  for  locating  instances  of  events 
were  inadequately  tredned,  its  performance  at  identifying  slot  values  once  an  instance  has  been  found  is 
quite  good. 

Future  work  on  the  system  will  solve  the  technical  problems  that  have  been  observed.  This  wUl  be 
achieved  by  performing  the  following  tasks: 

•  The  overall  system  flow  will  be  restructured  to  allow  backward-chaining  to  handle  more  of  the 
processing  load. 

•  The  current  forward-chaining  mechanism  will  be  reimplemented  so  that  it  is  specifically  geared  to 
the  processing  tasks  envisoned  for  KBIRD. 

•  Subject  to  an  appropriate  funding  source,  the  KBIRD  locator  rules  used  to  detect  instances  of 
predicted  event  types  wUl  be  properly  trained. 

In  addition  to  solving  the  technical  problems  that  have  arisen  in  the  system’s  KBIRD  component,  a 
major  effort  will  be  made  to  incorporate  the  Unisys  Pundit  NLP  system  into  the  MUC-3  system. 
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